EDS 213 Discussion 2


Data Cleaning

Database #1

Date Location PM2_5 NO2 O3
2020-01-01 Site A 12.3 20 15
2020/02/01 Site B fifteen 30 20
March 3, 2020 Site C 7.8 25 twenty
NA Site A 9 NA 30
2020-04-01 Site D 10.2 40 35
Sample_Date Site_Name pH Turbidity Lead_Concentration
01-01-2020 River X 6.5 3.2 0.05
2020-02-01 Lake Y 7 2.8 0.07
03-03-2020 River X 7.5 NA 0.1
2020-Apr-04 Lake Z 7.2 3.5 0.04
2020-05-01 River X 8.1 300.0 0.030 ppm

Database #1 Cleaning

  • Standaradize date formats
  • Convert PM 2.5 to numeric
  • Ensure NAs are actually numeric
  • Consistent naming across tables